Add pesto support for dynamic port forwarding via pasta control socket#755
Conversation
83c12a5 to
36bdb53
Compare
| } | ||
|
|
||
| args := portMappingsToPestoArgs(ports) | ||
| args = append(args, socketPath) |
There was a problem hiding this comment.
No, it cannot be. It should be checked by the caller. Should I add a check to be sure?
| // does NOT perform DNAT to the container. Netavark handles that inside the | ||
| // netns. Therefore each mapping uses HostPort as both source and destination | ||
| // (e.g. "-t 0.0.0.0/8080") so traffic arrives at the port netavark expects. | ||
| func portMappingsToPestoArgs(ports []types.PortMapping) []string { |
There was a problem hiding this comment.
does this func make sense if ports is empty? i think you will get something in args that is not expected?
There was a problem hiding this comment.
I think this makes sense. If no ports are specified, it will set -t none -u none, meaning no ports will be forwarded.
|
Packit jobs failed. @containers/packit-build please check. |
|
/packit rebuild-failed |
1 similar comment
|
/packit rebuild-failed |
6d74679 to
8e9f756
Compare
|
Since GH PR reviews are down posted here in the hope it works Not supporting ipv6 here is a hard blocker which we cannot ship with as this would be a major regression. ipv6 dnat work for all addresses except Testing locally with the --no-splice option even the real problem is not packets are send via link local address with is never routed fe80::8880:e9ff:fead:a5cc is the link local inside the rootless netns But I do have an actual global address assigned as well So why is pasta not forwarding to that address? If it would do that then it should work I think Then there is the larger design issue that comes due stripping off the host ip on the podman side in the rootless netns. All packets arrive only on the tap interface in pasta so the netns has no longer any way to route them to the right container ip as the real host ip info is lost at that point, i.e. try something like this all, request will land in one container which is bad and I Am not sure how we could even avoid this here. It is essentially this machine problem containers/podman#14928, we would need to remap ports in the rootless netns dynamically but this is ugly as we need to keep state and it also means we can run out of ports if users forward a lot of ports on many ips on the host but the pasta tap0 interface would only have one set so we get capped at 65535 I guess |
@sbrivio-rh @dgibson I think you talked about the flow table possibly having a destination ip. Is that something that would work today already? I guess it could used to avoid the issue if podman passes the final container ip right with the port mappings and then skip the dnat firewall rules completely because pasta handles it directly in one go |
I just noticed I tested with the wrong network without ipv6 enabled, we must test with a network created with |
I just realised that this might be due to Podman passing I guess we might want to maintain the original scope instead, that is, if the original destination address wasn't a link-local address, we shouldn't convert that. I didn't really think it through, but this definitely looks like something we can fix in pasta.
Ah, yes, that sounds like a good idea! We don't have the interface for that yet, and the destination address is not represented yet in struct fwd_rule, but at this point it should be straightforward to add (I didn't try yet). We happened to discuss the command line and pesto syntax for that, the seemingly obvious choice would be the same syntax as we use to match on the original destination address, but after the forwards traffic originally directed to 192.0.2.1 and port 8022 to port 22 in the container, with the inbound NAT logic I referenced above, we would allow a syntax like: to also specify a mapped destination IP address. I can give that a try tomorrow, it doesn't really look complicated at this stage. |
If that's the case, the patch at https://archives.passt.top/passt-dev/20260507043149.1989693-1-sbrivio@redhat.com/ ( I'm looking into the second point meanwhile (which might supersede this one anyway if I understood correctly). |
I sketched destination address mapping at https://archives.passt.top/passt-dev/20260507055036.2110260-1-sbrivio@redhat.com/ ( but it can be combined with all the other bits, that is, ranges, (outer) interface names, address binding, etc. Let me know if it's useful, I'll try to quickly publish a complete version if it is. |
I think so, but I would also like to hear from @Luap99. |
I think it would be useful to have this, but then if we directly NAT in pasta to the container ip we need to rework our approach here I believe. We need the container destination ip and due the nature of podman network connect/disconnect we would need to update that like we have with the current hacky rootlessport thing. IMO it is not really that maintainable the way the code is structured today, ideally we would move the pesto port setup/teardown into the network Setup() Teardown() in the libnetwork/netavark interface here and not need to do anythign special in podman, though we likely still need to send some option to specify if this is a network connect/disconnect vs initial setup/final teardown. I am afraid we run out of time for 6.0 with larger changes like this.
So how does this work regarding v4/v6 mappings? The host side is a dual stack socket so if pasta handles a v6 request will it NAT that down to v4 with target 192.0.2.1? AFAIK so far pasta has always kept the v4 -> v4/ v6 -> relationship? Must we split the host mapping into two to avoid this issue? |
|
How do people feel if we add a new containers.conf option rootless_network_port_forwarder="pasta|rootlessport" and then keep the current logic in addition to the new one and switch based on this option? Use rootlessport as default and call the pasta mode experimental for now until we have time to figure out and fix all edge cases. That would give us more time until 6.1, we could flip the default there if we are confident that this works properly and users who have issues could revert the setting. |
I don't know how it should work, I didn't really think about it. The way I sketched it should actually map any TCP connection, regardless of the IP version, to 192.0.2.1. I'm not sure if it works at all with IPv6. Proposals / wishes warmly welcome.
It should, yes.
Right. I would break that here. Is that an issue or a feature? There are other options as well:
That would definitely be less ambiguous.
I think it's a good idea in general. I'm just a bit worried that users might expect source addresses to be finally preserved, but it doesn't happen and we'll get flooded with bug reports. But I guess it's anyway the best option for users, especially if it's acceptable to flip the default later. |
|
I agree with the new I've made changes on the Podman side and wrote e2e tests covering the scenarios discussed here. Here is a commit https://github.com/containers/podman/pull/28478/changes/b3c787e1d10078f01c4c17f8c308a68bbaf780dcwith prototype changes to make the tests pass. I will do the cleanup next week. @Luap99 let me know if I'm missing any scenarios. Also, I tried using AI to solve the issue on Passt side. Here's a summary of what needs fixing on the Passt side. For the record, I did Passt changes with full AI assist. Destination Address Mapping: Bug FixesDestination Address Mapping: Bug FixesFixes for the destination address mapping HACK patch Files Modified
Bug 1: IPv6 Source Address Parsing BrokenFile: Symptom: Root cause: The condition that determines whether the first if ((spec = strchr(buf, '/')) &&
strchr(spec, ':') == strchr(buf, ':')) {For input Fix: Expand the condition to recognize IPv6 addresses by their leading if ((spec = strchr(buf, '/')) &&
(strchr(spec, ':') == strchr(buf, ':') ||
buf[0] == ':' || buf[0] == '[')) {If the buffer starts with Bug 2: Splice Path Ignores Destination Address MappingFile: Symptom: Traffic with destination mapping (e.g., Root cause: The original splice path in Fix (IPv4): Skip the splice path when Fix (IPv6): Allow the splice path for IPv6 even with The condition is: if (!c->no_splice && inany_is_loopback(&ini->eaddr) &&
(inany_is_unspecified(&rule->daddr) || !inany_v4(&ini->eaddr)) &&
(proto == IPPROTO_TCP || proto == IPPROTO_UDP)) {This means:
Bug 3: Multiple Rules for Same Port With Different Source AddressesSymptom: Adding two rules for the same port with different source addresses: Only the first rule worked. Connections to Root cause: This was a consequence of Bug 2. The splice path ignored Fix: Resolved by the Bug 2 fix. With IPv4 daddr connections routed through Bug 4: IPv6 TAP Path Source Address UnreachableFile: Symptom: IPv6 connections with daddr (e.g., Root cause: When the TAP path is used for IPv6 with daddr:
Fix: Modified the IPv6 fallback in the TAP path to use } else if (!inany_is_unspecified(&rule->daddr) &&
!inany_is_linklocal6(&rule->daddr)) {
tgt->oaddr.a6 = c->ip6.addr;
} else {
tgt->oaddr.a6 = c->ip6.our_tap_ll;
}This is a safety fallback — in practice, IPv6 with daddr now uses splice Bug 5: IPv6 Splice Source IP Shows Bridge GatewayFiles: Symptom: IPv6 forwarding with daddr reached the container, but the source Expected (like IPv4): Root cause: The splice path set Fix (part 1 — if (!inany_is_unspecified(&rule->daddr)) {
tgt->eaddr = rule->daddr;
if (inany_v4(&tgt->eaddr))
tgt->oaddr = inany_any4;
else
tgt->oaddr.a6 = c->ip6.addr_seen;
}Fix (part 2 — if (!inany_is_unspecified(&tgt->oaddr)) {
union sockaddr_inany bind_sa;
pif_sockaddr(c, &bind_sa, tgtpif, &tgt->oaddr, tgt->oport);
if (bind(conn->s[1], &bind_sa.sa, socklen_inany(&bind_sa)))
flow_trace(conn, "Can't bind splice socket: %s",
strerror_(errno));
}This forces the splice connection to originate from pasta's TAP address Summary of Path Selection
@sbrivio-rh What do you think about changes to Passt? |
I don't think they are particularly useful at the moment. That's a draft I hacked up quickly and kept it light on purpose (with no error handling, without adjusting other functions in fwd.c, etc.). We need to define the behaviour we want first (based on what we think would be least surprising for users, and also as generic / powerful as possible), see my comment above. Then we can add a proper implementation. I shared that draft in the hope it would help you speeding up integration and to check how to cover specific aspects that @Luap99 brought up. If you have any specific issue let me know and I'll be happy to fix it. I also have the feeling that we shouldn't use LLMs to communicate about these design topics, a bit because of the policy (https://github.com/containers/podman/blob/main/LLM_POLICY.md#no-llm-generated-direct-communication), and a bit because we would just regress to average, which isn't exactly what we want here as we have an original design question to answer in the best possible way. Other than that, let me know if I can help in any way, or if you have preferences or input about the possible behaviours I listed above. Thanks! |
My intention was to get a head start on the integration. I tried using AI to resolve the issues on the Pasta side because I am unfamiliar with the codebase, and it seemed like the fastest workaround. I just wanted to share my progress, but I completely agree that relying on AI for this isn't the correct approach. |
3ca92f8 to
e1e3e04
Compare
|
@Luap99 I've made the adjustments and added the What are the next steps for podman 6.1?
|
| logrus.Warnf("pesto: unsupported protocol %q, skipping", protocol) | ||
| continue |
| switch c.RootlessPortForwarder { | ||
| case "", "rootlessport", "pasta": | ||
| default: | ||
| return fmt.Errorf("invalid rootless_port_forwarder value %q, must be \"rootlessport\" or \"pasta\"", c.RootlessPortForwarder) | ||
| } |
There was a problem hiding this comment.
I prefer validation not to be done as part of init code here, this switch should be in-lined into the podman caller and we can error if we actually end up on that code path instead of for every single podman command
| // bridge networks. "rootlessport" (default) uses a userspace TCP/UDP proxy. | ||
| // "pasta" (experimental) uses pasta's kernel splice for forwarding, which | ||
| // preserves the original source IP address inside the container. | ||
| RootlessPortForwarder string `toml:"rootless_port_forwarder,omitempty"` |
There was a problem hiding this comment.
A new config file field needs docs in common/docs/containers.conf.5.md and common/pkg/config/containers.conf
Fixes: https://redhat.atlassian.net/browse/RUN-2214 Fixes: containers/podman#8193 Fixes: https://redhat.atlassian.net/browse/RUN-3587 Signed-off-by: Jan Rodák <hony.com@seznam.cz>
6a1b221 to
5680642
Compare
Fixes: https://redhat.atlassian.net/browse/RUN-2214 Fixes: containers/podman#8193 Fixes: https://redhat.atlassian.net/browse/RUN-3587 Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Fixes: https://redhat.atlassian.net/browse/RUN-2214 Fixes: containers/podman#8193 Fixes: https://redhat.atlassian.net/browse/RUN-3587 Signed-off-by: Jan Rodák <hony.com@seznam.cz>
|
PTAL @containers/podman-maintainers @containers/podman-reviewers |
|
[snip]
I haven't yet fully understood the picture here. But the short answer here is that we're picking the link-local address for the destination because the source address is link-local. So the next question is why the (translated?) source address is link local. What is the source address of the connection on the outside? |
Just for clarity, as I also noted in https://archives.passt.top/passt-dev/20260515005015.6e23cc47@elisabeth/, this isn't really relevant anymore for the pull request at hand (see also #755 (comment)). If @Luap99 happens to have a quick answer / remember about this test it would still be nice to have this information, though. |
Well that is the thing I used a global ip on the host side, does the kernel remap it before pasta sees it somehow? quick pasta example without relying on the new code here. terminal 1 which is the ipv6 with global scope. that is also assigned in the container terminal 2 so the source is fe80... link-local inside the ns, when I look at the tcpdump on the host the interesting thing is that the source ip is correct I guess it is fine to ignore for now. FWIW I was able to confirm actual external connects from a second system were correctly using the global address, so the worst case is only connections from the host are not working via ip6. |
I don't think so.
Aha! I'd been assuming the external connection was coming from somewhere else, rather than from the host on the address shared with the container. This curl will (in most circumstances) use that address as both the source and destination. IIUC, podman invokes pasta with an IPv4 [snip]
It's arguably doing the wrong thing, but it's not related to the host interface. There's essentiall no mechanism by which that could affect things. The relevant code is this stanza in
I'll grant that's kind of weird, but what should we do instead? We could simply drop the connection, but that doesn't seem desirable either. To translate without altering address scope, we need a global scope address we can use. You can specify that with Doing better automatically is tricky, because we don't really have the power to allocate global addresses anywhere. We can't use the gateway address as we do for IPv4, because that will nearly always also be link-local, so it will have the same problem. Maybe we search for a free RFC4193 address ( |
| lock: lock, | ||
| syslog: conf.Syslog, | ||
| rootlessNetns: netns, | ||
| rootlessPortForwarder: conf.Config.Network.RootlessPortForwarder, |
There was a problem hiding this comment.
Not a change for this PR, but it would be nice to have a follow-up that puts these in alpha order.
|
LGTM |
|
/lgtm |
|
It seems that |
pesto_linux.go) that invokes the pesto binary to dynamically update pasta's port forwarding table via a UNIX domain socket, enabling rootless bridge containers to add/remove port mappings without restarting pasta.-c <socketPath>to enable the pesto control channel and exposePestoSocketPathinRootlessNetnsInfo.Related to: containers/podman#28478
Fixes: https://redhat.atlassian.net/browse/RUN-2214
Fixes: containers/podman#8193
Fixes: https://redhat.atlassian.net/browse/RUN-3587